Skip to content

[pull] master from git:master#118

Merged
pull[bot] merged 23 commits intoturkdevops:masterfrom
git:master
Oct 23, 2025
Merged

[pull] master from git:master#118
pull[bot] merged 23 commits intoturkdevops:masterfrom
git:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Oct 23, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

bk2204 and others added 23 commits October 9, 2025 17:46
Our current pack index v3 format uses 4-byte integers to find the
trailer of the file.  This effectively means that the file cannot be
much larger than 2^32.  While this might at first seem to be okay, we
expect that each object will have at least 64 bytes worth of data, which
means that no more than about 67 million objects can be stored.

Again, this might seem fine, but unfortunately, we know of many users
who attempt to create repos with extremely large numbers of commits to
get a "high score," and we've already seen repositories with at least 55
million commits.  In the interests of gracefully handling repositories
even for these well-intentioned but ultimately misguided users, let's
change these lengths to 8 bytes.

For the checksums at the end of the file, we're producing 32-byte
SHA-256 checksums because that's what we already do with pack index v2
and SHA-256.  Truncating SHA-256 doesn't pose any actual security
problems other than those related to the reduced size, but our pack
checksum must already be 32 bytes (since SHA-256 packs have 32-byte
checksums) and it simplifies the code to use the existing hashfile logic
for these cases for the index checksum as well.

In addition, even though we may not need cryptographic security for the
index checksum, we'd like to avoid arguments from auditors and such for
organizations that may have compliance or security requirements.  Using
the simple, boring choice of the full SHA-256 hash avoids all possible
discussion related to hash truncation and removes impediments for these
organizations.

Note that we do not yet have a pack index v3 implementation in Git, so
it should be fine to change this format.  However, such an
implementation has been written for future inclusion following this
format.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current design of pack index v3 has items in two different orders:
sorted shortened object ID order and pack order.  The shortened object
IDs and the pack index offset values are in the former order and
everything else is in the latter.

This, however, poses some problems.  We have many parts of the packfile
code that expect to find out data about an object knowing only its index
in pack order.  With the current design, to find the pack offset after
having looked up the index in pack order, we must then look up the full
object ID and use that to look up the shortened object ID to find the
pack offset, which is inconvenient, inefficient, and leads to poor cache
usage.

Instead, let's change the offset values to be looked up by pack order.
This works better because once we know the pack order offset, we can
find the full object name and its location in the pack with a simple
index into their respective tables.  This makes many operations much
more efficient, especially with the functions we already have, and it
avoids the need for the revindex with pack index v3.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for the hash function transition reflects the original
design where the SHA-256 signature would always be placed in a header.
However, due to a missed patch in Git 2.29, we shipped SHA-256 support
such that the signature for the current algorithm is always an in-body
signature and the opposite algorithm is always in a header.  Since the
documentation is inaccurate, update it to reflect the correct
information.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is fair to say that our pack and indexing code is quite complex.
Contributors who wish to work on this code or implementors of other
implementations would benefit from clear, unambiguous documentation
about how our data formats are structured and encoded and what data is
used in the computation of certain values.  Unfortunately, some of this
data is missing, which leads to confusion and frustration.

Let's document some of this data to help clarify things.  Specify over
what data CRC32 values are computed and also note which CRC32 algorithm
is used, since Wikipedia mentions at least four 32-bit CRC algorithms
and notes that it's possible to use different bit orderings.

In addition, note how we encode objects in the pack.  One might be led
to believe that packed objects are always stored with the "<type>
<size>\0" prefix of loose objects, but that is not the case, although
for obvious reasons this data is included in the computation of the
object ID.  Explain why this is for the curious reader.

Finally, indicate what the size field of the packed object represents.
Otherwise, a reader might think that the size of a delta is the size of
the full object or that it might contain the offset or object ID,
neither of which are the case.  Explain clearly, however, that the
values represent uncompressed sizes to avoid confusion.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently have no documentation for how loose objects are stored.
Let's add some here so it's easy for people to understand how they
work.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, we have a way to print the storage hash, the input hash, and
the output hash, but we lack a way to print the compatibility hash.  Add
a new type to --show-object-format, compat, which prints this value.

If no compatibility hash exists, simply print a newline.  This is
important to allow users to use multiple options at once while still
getting unambiguous output.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're creating a tag, we want to make sure that gpgsig and
gpgsig-sha256 headers are allowed for the commit.  The default fsck
behavior is to ignore the fact that they're left over, but some of our
tests enable strict checking which flags them nonetheless.  Add
improved checking for these headers as well as documentation and several
tests.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to specify a compatibility hash for testing interactions for
SHA-256 repositories where we have SHA-1 compatibility enabled.  Allow
the user to specify this scenario in the test suite by setting
GIT_TEST_DEFAULT_HASH to "sha256:sha1".

Note that this will get passed into GIT_DEFAULT_HASH, which Git itself
does not presently support.  However, we will support this in a future
commit.

Since we'll now want to know the value for a specific version, let's add
the ability to specify either the storage hash (in this case, SHA-256)
or the compatibility hash (SHA-1).  We use a different value for the
compatibility hash that will be enabled for all repositories
(test_repo_compat_hash_algo) versus the one that is used individually in
some tests (test_compat_hash_algo), since we want to still run those
individual tests without requiring that the testsuite be run fully in a
compatibility mode.

In some cases, we'll need to adjust our test suite to work in a proper
way with a compatibility hash.  For example, in such a case, we'll only
use pack index v3, since v1 and v2 lack support for multiple algorithms.
Since we won't want to write those older formats, we'll need to skip
tests that do so.  Let's add a COMPAT_HASH prerequisite for this
purpose.

Finally, in this scenario, we can no longer rely on having broken
objects work since we lack compatibility mappings to rewrite objects in
the repository.  Add a prerequisite, BROKEN_OBJECTS, that we define in
terms of COMPAT_HASH and checks to see if creating deliberately broken
objects is possible, so that we can disable these tests if not.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When hash compatibility mode is enabled, we cannot write broken objects
because they cannot be mapped into the other hash algorithm.  Use the
BROKEN_OBJECTS prerequisite to disable these tests and the writing of
broken objects in this mode.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback:

- it's confusing that we use both <branch> and <refspec> to refer to the
  second argument
- one user is not clear about what `refs/heads/*:refs/remotes/origin/*`
  is meant to be an example of ("is it like a path?")

The DESCRIPTION section is also doing a lot right now: it's trying to
describe both how the <repository> and <refspec> arguments work (which
is pretty complex, as seen in the DEFAULT BEHAVIOUR section)
as well as how `git pull` calls `git fetch` and merge/rebase/etc
depending on the arguments.

Handle this by moving the description of the <repository> and <refspec>
arguments to the OPTIONS section, so that we can focus on the
merge/rebase/etc behaviour in the DESCRIPTION section, and refer folks
to the later sections for details.

Use the term "upstream" instead of 'the "remote" and "merge"
configuration for the current branch' since users are more likely to
know what an "upstream" is.

Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback:

- One user is confused about the current default ("I was convinced that
  the git default was still to merge on pull")
- One user is confused about why "git fetch" isn't mentioned earlier
- One user says they always forget what the arguments to `git pull` are
  and that it's not immediately obvious that `--no-rebase` means "merge"
- One user wants `--ff-only` to be mentioned

Resolve this by listing the options for integrating the the remote
branch. This should help users figure out at a glance which one they
want to do, and make it clearer that --ff-only is the default.

Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback: this example is confusing because it implies that
`git pull` will run `git merge` by default, but the default is
`--ff-only`.

We could instead show an example of a fast-forward merge, but that may
not add a lot since fast-forward merges are relatively simple. This lets
us keep the description short.

Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback:

- One user is confused about why `git reset --merge`
  (why not just `git reset`?). Handle this by mentioning
  `git merge --abort` and `git reset --abort` instead, which have a
  more obvious meaning.
- 2 users want to know what "In older versions of Git" means exactly
  (in versions older than 1.7.0). Handle this by removing the warning
  since it was added 15 years ago (in 3f8fc18)

Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update old-style shell path checks to use the modern test
helpers 'test_path_is_file' and 'test_path_is_dir' for improved
runtime diagnosis.

Signed-off-by: Solly <solobarine@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bumps
[actions/download-artifact](https://github.com/actions/download-artifact)
from 4 to 5.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@v4...v5)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v5)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bumps [actions/setup-python](https://github.com/actions/setup-python)
from 5 to 6.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v5...v6)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bumps [actions/github-script](https://github.com/actions/github-script)
from 7 to 8.
- [Release notes](https://github.com/actions/github-script/releases)
- [Commits](actions/github-script@v7...v8)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
CI update.

* js/ci-github-actions-update:
  build(deps): bump actions/github-script from 7 to 8
  build(deps): bump actions/setup-python from 5 to 6
  build(deps): bump actions/checkout from 4 to 5
  build(deps): bump actions/download-artifact from 4 to 5
The beginning of SHA1-SHA256 interoperability work.

* bc/sha1-256-interop-01:
  t1010: use BROKEN_OBJECTS prerequisite
  t: allow specifying compatibility hash
  fsck: consider gpgsig headers expected in tags
  rev-parse: allow printing compatibility hash
  docs: add documentation for loose objects
  docs: improve ambiguous areas of pack format documentation
  docs: reflect actual double signature for tags
  docs: update offset order for pack index v3
  docs: update pack index v3 format
Documentation updates.

* je/doc-pull:
  doc: git-pull: clarify how to exit a conflicted merge
  doc: git-pull: delete the example
  doc: git-pull: clarify options for integrating remote branch
  doc: git-pull: move <repository> and <refspec> params
Test modernization.

* so/t2401-use-test-path-helpers:
  t2401: update path checks using test_path helpers
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@pull pull bot locked and limited conversation to collaborators Oct 23, 2025
@pull pull bot added the ⤵️ pull label Oct 23, 2025
@pull pull bot merged commit c54a18e into turkdevops:master Oct 23, 2025
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants